Stochastic Gradient Boosting

نویسنده

  • Jerome H. Friedman
چکیده

Gradient boosting constructs additive regression models by sequentially tting a simple parameterized function (base learner) to current \pseudo"{residuals by least{squares at each iteration. The pseudo{residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point, evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Speci cally, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to t the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner. 1 Gradient Boosting In the function estimation problem one has a system consisting of a random \output" or \response" variable y and a set of random \input" or \explanatory" variables x = fx1; ; xng. Given a \training" sample fyi;xig N 1 of known (y;x){values, the goal is to nd a function F (x) that maps x to y, such that over the joint distribution of all (y;x){values, the expected value of some speci ed loss function (y; F (x)) is minimized F (x) = argmin F (x) Ey;x (y; F (x)): (1) Boosting approximates F (x) by an \additive" expansion of the form

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-class Image Classification Based on Fast Stochastic Gradient Boosting

Nowadays, image classification is one of the hottest and most difficult research domains. It involves two aspects of problem. One is image feature representation and coding, the other is the usage of classifier. For better accuracy and running efficiency of high dimension characteristics circumstance in image classification, this paper proposes a novel framework for multi-class image classifica...

متن کامل

ada: An R Package for Stochastic Boosting

Boosting is an iterative algorithm that combines simple classification rules with ‘mediocre’ performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R ...

متن کامل

Combining Bias and Variance Reduction Techniques for Regression Trees

Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches — ...

متن کامل

Mobile Phone Customer Type Discrimination via Stochastic Gradient Boosting

Mobile phone customers face many choices regarding handset hardware, add-on services, and features to subscribe to from their service providers. Mobile phone companies are now increasingly interested in the drivers of migration to third generation (3G) hardware and services. Using real world data provided to the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2006 Da...

متن کامل

A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database

We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet R ) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boos...

متن کامل

Gradient Boosting on Stochastic Data Streams

Boosting is a popular ensemble algorithm that generates more powerful learners by linearly combining base models from a simpler hypothesis class. In this work, we investigate the problem of adapting batch gradient boosting for minimizing convex loss functions to online setting where the loss at each iteration is i.i.d sampled from an unknown distribution. To generalize from batch to online, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999